home *** CD-ROM | disk | FTP | other *** search
- This aritcle and accompanying programs were typed from the August 1988
- ( #142 ) issue of Dr. Dobbs Journal. The author & programmer is Marvin
- Hymowech, a contributing author for the magazine. Any reference to wild-
- cards in the following will not apply like the author says. According to
- the docs below, you have to have a library function supplied with Microsoft
- C to link with BLDFUNCS in order to get wildcards to work. This file, the
- BLDFUNCS.C in FFUNCS1.ARC, can be linked without the Microsoft library
- function to accept wildcards unlike the BLDFUNCS.C in the file FFUNCS.ARC.
- Any function mentioned in the text below that begins with filter_ has
- been changed (in the code) to begin with flt_ so the program would work with
- the Mix Power C debugger, which recognizes only the first 8 characters in
- variables or functions. (not sure about variables)
- _________________________________________________________________________
-
- ***** FIND THAT FUNCTION *****
-
- You're a C programmer who has just changed employers. You are suddenly
- responsible for maintaining more than 100 source code files and perhaps three
- times as many C functions. In wandering through your new wealth of source
- code, you find a reference to a function called get_input(), which sounds
- like a good thing to look at next. But which file is it in? Or perhaps you
- recall seeing a function called save_screen() (or was it save_scrn() ?) that
- would be just the thing to use in that new function you have to write, but you
- just can't recall where you saw it. What to do?
- The traditional way of handling these problems is this: First, you use a
- text-searching program such as GREP (Unix) or TS (from The Norton Utilities)
- and search through all your source files for every reference to the function
- you are looking for until you finally locate the reference that is the
- function definition. Then, you bring up the source file in your editor. Next,
- you search for the function again until you finally find the function
- definition. By this time, the reason you were looking for it has usually
- slipped your mind.
- Having suffered through this situation several times over the years (and
- not just with other people's code), I resolved to find a better way. This
- article presents my solution: a function finder for MS-DOS. Using the function
- finder (actually, it consists of two programs) speeds up the process
- considerably. You simply invode the finder with the function name as an
- argument. The program then scans an index file, invokes your editor using the
- appropriate source file, and, for sophisticated editors such as Brief, even
- positions the cursor at the function definition.
- The two programs that make up the function finder, BLDFUNCS and GETF, are
- written in C. Although I used Microsoft C (Version 5.0) to compile the
- programs presented here, the code could be ported to other compilers (or to
- Unix) with minimal changes. The editor I use is Brief, Version 2.01, but any
- other editor could be used.
-
- HOW IT WORKS
- As I mentioned earlier, my solution consists of two programs, BLDFUNCS
- and GETF. BLDFUNCS constructs an index by reading through all the C source
- files specified on the command line and then constructiog a text file
- (FUNCS.TXT) that contains the name of each C source file read and a list of
- all functions defined therein. The resulting file looks something like this:
-
- file_1.c:
- function_1a
- function_1b
- . . .
- function_1z;
-
- file_2.c:
- function_2a
- . . .
-
- Once the index file has been built by BLDFUNCS, GETF can be run with a
- function name as a command-line argument. GETF reads FUNCS.TXT until it finds
- a match for the specified function name (wildcard characters are allowed) and
- then gets the name of the C source file in which the function resides. Next,
- GETF constructs a DOS command line invoking the editor of your choice with
- this source file as the file to be edited. Then, a DOS exec call is performed
- to replace GETF by the editor in memory. For editors such as Brief, which let
- you specify editor commands on the command line as well, even more is
- possible. You can specify an appropriate search command to position the cursor
- at an occurrence of the function name, which will hopefully be the actual
- definition of the sought-for function.
- In brief then, you run BLDFUNCS once to construct FUNCS.TXT and thereafter
- whenever major changes occur in the distribution of functions among your
- source files. If your then need that elusive function SAVE_SCREEN
- (SAVE_SCRN ?), you just type GETF SAVE_SCR*. One of three things will happen
- then:
- 1. GETF will politely tell you that there is no such function in FUNCS.TXT.
- 2. GETF will find that there is exactly one function in FUNCS_TXT matching the
- mask SAVE_SCR*, and the next thing you will see is the desired source
- file in your editor, with the cursor positioned at the exact place where
- the function is defined (maybe).
- 3. GETF will find more than one match for SAVE_SCR* and will present you with
- a menu of such functions and the source files in which they are defined;
- after you choose one of these files, it will be presented in the editor
- as in case 2.
-
- BLDFUNCS es intended to read C files that work with almost any C compiler
- on the market; it is assumed, however, that the files will compile without
- errors.
-
- DEALING WITH EDITORS
- Now for some of the details. In an effort to land exactly at the function
- definition after the editor was invoked, I first tried instructing Brief to
- start at the beginning of the file and search forward for the function name.
- Most programmers, however, code so that functions are more likely to be refer-
- enced (in the source file in which they are defined) BEFORE they are defined
- rather than after. In keeping with this general rule, I wrote a custom Brief
- macro to start at the end of the file and search backward for the function
- name instead. This worked much better and in fact lands me at the exact spot
- about 50 percent of the time; when it doesn't work, I just use the key
- assigned to SEARCH_AGAIN to continue looking.
- The command line GETF uses to invoke the editor is built by using the DOS
- environment variable GETFEDIT, which allows users to customize GETF to their
- editor of choice. On my machine, my autoexec.bat file contains the line:
-
- set GETFEDIT=b -m'funcsrch %%s'%%s
-
- Where funcsrch is the Brief macro mentioned earlier. The -m option tells Brief
- to invoke the macro in quotes after it begins. GETF replaces the first %%s by
- the function name it is looking for and the second %%s by the file name in
- which it will be found.
- My Brief macro funcsrch is shown in Example 1, this page. As you can see,
- Brief is programmed in a language that is roughly a cross between C and Lisp.
- The effect of this macro is first to position the cursor at the end of the
- file being edited and then to search backward for a specified string (the
- function name in this instance). I found this aporoach was more effective for
- positioning the cursor exactly on the definition of the sought-for function
- rather than on a reference to it. The built-in macro end_of_buffer positions
- the cursor at the end of the buffer and then the search_back macro is used to
- look for the string s obtained from the command line invoking the macro. The
- built-in external variable _s_pat is then set to the same value as s so that
- subsequent invocations of the macro search_again (assigned to Shift-F5 on my
- machine) will continue to search backward for the function name in question
- (also note that the external variable _dir is set to 0 so that subsequent
- search_again's will go backward, not forward).
-
- (macro funcsrch
- (
- ( string s )
- ( extern _s_pat _dir center_line )
-
- ( get_parm 0 s )
- ( message "locating function %s ..." s )
- ( end_of_buffer )
- (if ( > (search_back s) 0 )
- (
- ( message "function %s found" s )
- ( center_line )
- (sprintf _s_pat "%s" s )
- ( = _dir 0 )
- )
- ;else
- (
- ( top_of_buffer )
- ( beep )
- ( message "function %s not found" s )
- )
- )
- )
- )
-
-
- ***** EXAMPLE 1: A search macro for the Brief editor *****
-
- BUILDING THE FUNCTION LIST
- BLDFUNCS presented the thorniest problem: The program had to be suffic-
- iently cognizant of C syntax to extract the names of functions defined in a
- source file yet ignore the similar constructions such as prototype function
- declarations, used for argument type checking.
- The approach I decided upon was to use several "filter functions" in suc-
- cession to simplify the character stream obtained from the source file until I
- was able to extract the function names. Explaining from the bottom up (which
- is essentially the order in which the program was coded), the lowest-level
- filter is filter_cmt, which reads the raw character stream and returns a
- character stream identical to its input except that all comments have been
- eliminated. (Each filter function behaves like FGETC does - each takes a FILE
- pointer as input and returns characters, the value EOF, or other special
- values marking structures that have been "collapsed".) filter_cmt is
- essentially a small finite-state machine, the states reflecting whether you
- have just recieved an * (asterisk), or a / (slash), or neither. Some C
- compilers allow nested comments, so filter_cmt maintains a cmt_level variable,
- which is incremented when a /* (slash, asterisk) is received and decremented
- when an */ (asterisk, slash) is received; characters are returned only when
- cmt_level reaches zero.
- The next filter is filter_quotes, which reads the character stream returned
- by filter_cmt and replaces any quoted string (delimited by either single or
- double quotes) by the special value QUOTES, which is used as a place marker
- for the original string. Higher-level filters look for matching curly braces,
- for example, and would be confused by curly braces occurring within quotes.
- The only subtlety in filter_quotes is to treat escape characters (preceded by
- a backslash) correctly to avoid terminating a quoted string prematurely - for
- example, the string \} (backslash, curly brace).
- Next in the hierarchy comes filter_ppdir, whose task is to read from the
- stream provided by filter_quotes and eliminate all preprocessor directives. (I
- have made the simplifying assumption that no function will be defined either
- via a #define directive or in an include file. Although such peculiarities are
- possible, they are rare and are not constructs found when good programming
- style is employed.) The gotcha to be avoided in this filter is that #define
- constructs may extend to several programming lines, so filter_ppdir is careful
- to scan for escape new-line sequences (a backslash followed immediately by a
- newline character) before deciding that a #define construct has terminated.
- Next, filter_curly_braces reads the stream returned by filter_ppdir and
- replaces all characters between matching curly brace pairs ( {} ) by the
- special value BRACES. This is accomplished by maintaining a brace count that
- begins at zero and is incremented when a left brace is encountered and decre-
- menter when a right brace is encountered. While the count is nonzero, no
- incoming character is returned to the caller.
-
- GETTING DOWN TO FUNCTIONS
- At this point, the filtered input stream consists of external data items,
- function nodels for type checking, and actual function definitions. To
- simplify the task further, filter_parens reads the character stream provided
- by filter_curly_braces and eliminates all characters between parentheses,
- returning instead the special value PARENS. This approach eliminates thorny
- problems in the declaration of formal function parameters - for example, in
- the function definition:
-
- int function1(a, b, c)
- int(*a)();
- char (*b)();
- int c;
- {
- }
-
- Some data initialization expressions can have constructions that resemble
- functions also - for example:
-
- int size = sizeof(array)/sizeof(element);
-
- To avoid such problems, filter_data reads the stream provided by
- filter_parens and deletes the right_hand side of any assignment and the equal
- sign, leaving just the semicolon.
- The stream returned by filter_data is sufficiently simple that it can now
- be used to extract the names of defined functions. The routine get_names_one_
- file opens the file specified by the parameter source_filename; reads this
- file via filter_data; and writes the resulting function names to the file
- specified by the parameter fp_out, which is the already opened FILE pointer to
- the output stream for FUNCS.TXT. This is done by storing characters from
- filter_data until EOF, or a semicolon, or the PARENS symbol is encountered.
- You are really only interested in sequences like this that end with PARENS
- because a sequence ending with a semicolon before a PARENS symbol cannot rep-
- resent a function definition.
- The routine get_fn_name is now used to extract the function name from the
- stored line, by scanning backward from the PARENS symbol until a character is
- encountered that is neither an underscore nor an alphanumeric. The decision as
- to whether this is a function definition or a type-checking construction is
- made as follows: If the first non-white-space character encountered after the
- PARENS symbol is a semicolon or a comma, it is a type-checking construction.
- The comma might occur as follows:
-
- int func1(int, char), func2(int);
-
- If you have an actual function definition, you bypass all characters until the
- BRACES symbol is encountered because certainly no function definition can
- occur between the PARENS and BRACES symbols. It only remains to append the
- function name to the output file, FUNCS.TXT.
-
- FINISHING OFF BLDFUNCS
- The main function of BLDFUNCS opens FUNCS_TXT and calls get_names_one_file
- for every source file specified on the input line. To enable expansion of
- wildcards on the command line, the module setargv (provided with the Microsoft
- C compiler) is linked in with BLDFUNCS (See the MAKE file in Example 2). This
- module lets you use a command such as:
-
- bldfuncs *.c \othersource\*.c
-
- which results in a FUNCS.TXT file containing the names of functions in every C
- source file in both the current directory and the directory \othersource.
-
-
- bldfuncs.obj: bldfuncs.c
- cl /c bldfuncs.c
-
- bldfuncs.exe: bldfuncs.obj
- link bldfuncs+\msc5\lib\setargv/ST:14000/NOE
-
-
- ****** EXAMPLE 2: MAKE file for BLDFUNCS ******
-
- BLDFUNCS uses lots of stack space for local storage, so my MAKE file
- specifies a stack size of 14000 in the link step:
-
- link bldfuncs+\msc5\lib\setargv/ST:14000/NOE;
-
- THE SECOND HALF
- Now let's look at GETF, whose job is to scan FUNCS.TXT for a spedified
- function name (or for all names matching a specified pattern) and to present
- the appropriate source file in the editor. As the first step in the process,
- GETF looks for the DOS environment variable GETFEDIT, which must contain the
- control string used to construct the command line that invokes the editor. For
- example, if your editor were named edit. you would first issue the DOS
- command:
-
- set GETFEDIT=edit %s
-
- and if this line were placed in your autoexec.bat file, it would look like:
-
- set GETFEDIT=edit %s%s
-
- Note the use of the %% sequence to avoid interpretation of the % symbol by
- command.com
- The Microsoft string function strstr is used to verify that there is at
- least one occurrence of %s in GETFEDIT. (Given two strings s1 and s2,
- strstr(s1, s2) returns a character pointer to the first occurrence of s2 in s1
- or NULL if there is none.) Then, the string function strtok is used to scan
- GETFEDIT for an initial token (delimited by white space) that should be the
- program name of the editor. Recall that given two strings, s and delim,
- strtok(s, delim) scans s for a token ending with a character from the delim
- string, replaces this character with a null, and returns a character pointer
- to this token in s. Subsequent calls to strtok have the form strtok(NULL,
- delim) and return character pointers to subsequent tokens, finally returning
- NULL when no more tokens are to be had.
- The program name of the editor is stored in pgm_name, and the remainder of
- the GETFEDIT string is stored in arg1_ctl; these are used later in construc-
- ting an exec call to invoke the editor.
- Next, GETF opens FUNCS.TXT and scans for file names, which end with a
- colon. Any such name is saved in file_token. Having obtained file_token, GETF
- scans for function names that match the pattern func_name (obtained from the
- GETF command line) using the function patn_match. Any such match is stored in
- the arrays func_choices and (corresponding) file_choices indexed by the
- integer num_choices.
- When this scan terminates, num_choices is examined to see if any matches
- were found. If there was no match, an appropriate message is printed and the
- editor is not invoked. If there was exactly one match, the function edit is
- invoked to bring up the specified file and function in the editor. If several
- matches occurred, the function ask_for_file is inviked to prompt the user for
- a choice among these, and then the function edit is invoked as in the case of
- one match.
- The function patn_match accepts a pattern and a string as arguments and
- returns TRUE or FALSE, depending upon whether or not a match occurred. The
- pattern-matching rules are tailored to the case of function identifiers as
- follows: a ? matches any single character, an * matches the remainder of any
- string, and a % matches any string up to the next underscore character or the
- end of the string.
- The function EDIT accepts a function and a file as parameters and uses
- the previously obtained strings pgm_name and arg1_ctl (parsed from the
- GETFEDIT variable) as parameters for an execlp call, which, if successful,
- will overlay the currently executing program, GETF, with the editor. The p in
- execlp serves as a reminder that the DOS PATH variable is used to locate
- pgm_name.
- The function ask_for_file uses the arrays func_choices and file_choices
- and the index variable num_choices to present the user with a menu of possible
- functions matching the pattern entered on the GETF command line and the corr-
- esponding files in which they reside. If the command getf get_%_data were
- issued, a typical menu might look like:
-
- Which one? (CR to exit)
- 1:get_all_data in getdata.c
- 2:get_good_data in valid.c
- 3:get_some_data in input.c
- Enter number:
-
- When a listed number is chosen, the edit function is invoked using the
- specified elements of func_choices and file_choices.
- In order to avoid any stack-space problems, a generous allocation of
- 14,000 stack bytes is made in the link step:
-
- link getf/ST:14000;
-
- The stack allocation of 14,000 bytes is the same for both BLDFUNCS and GETF.
-
- SUMMING UP
- I have found these two programs used in conjunction to be an excellent
- timesaver whenever I have to go wandering through source code files and have
- been using them extensively since I wrote them. The programs have saved me a
- great deal of time, and fall into the performance category of "fast enough."
- Alhoough I've no doubt that there are more efficient ways of parsing through
- C source code than my multiple-filter strategy, the method I chose had two
- outstanding advantages over other methods: it was easy to program, and it was
- simple to debug.
- Of course, the techniques presented in this article could be generalized to
- languages other than C. GETF would repuire almost no changes, but a new
- version of BLDFUNCS would be repuired in order to do the parsing specific to
- the language desired. I'd be interested in hearing from anyone who does the
- conversion to another language.
-
- Marvin Hymowech works as a programmer for Condor Computer Corp. Previously he
- taught mathematics at the University of Michigan in Ann Arbor. He may be
- reached at 4906 Cole Blvd., Ypsilanti, MI 48197.
-
- ******* END OF TEXT *******
-
-